Cholesky Factorization of Band Matrices Using Multithreaded BLAS

نویسندگان

  • Alfredo Remón
  • Enrique S. Quintana-Ortí
  • Gregorio Quintana-Ortí
چکیده

In this paper we analyze the efficacy of the LAPACK blocked routine for the Cholesky factorization of symmetric positive definite band matrices on Intel SMP platforms using two multithreaded implementations of BLAS. We also propose strategies that alleviate some of the performance degradation that is observed, and which is basically due to the use of multiple threads when dealing with problems of small scale.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Direct and Incomplete Cholesky Factorizations with Static Supernodes

Introduction Incomplete factorizations of sparse symmetric positive definite (SSPD) matrices have been used to generate preconditioners for various iterative solvers. These solvers generally use preconditioners derived from the matrix system, , in order to reduce the total number of iterations until convergence. In this report, we investigate the findings of ref. [1] on their method for computi...

متن کامل

LAPACK-Style Codes for Pivoted Cholesky and QR Updating

Routines exist in LAPACK for computing the Cholesky factorization of a symmetric positive definite matrix and in LINPACK there is a pivoted routine for positive semidefinite matrices. We present new higher level BLAS LAPACK-style codes for computing this pivoted factorization. We show that these can be many times faster than the LINPACK code. Also, with a new stopping criterion, there is more r...

متن کامل

Optimizing Locality of Reference in Cholesky Algorithms1

This paper presents the principle ideas involved in hierarchical blocking, introduces the block packed storage scheme, and gives the implementation details and the performance rates of the hierarchically blocked Cholesky factorization. In some cases the newly developed routines are faster by an order of magnitude than the corresponding Lapack routines. Introduction Most current computers based ...

متن کامل

An Algorithm-by-Blocks for SuperMatrix Band Cholesky Factorization

We pursue the scalable parallel implementation of the factorization of band matrices with medium to large bandwidth targeting SMP and multi-core architectures. Our approach decomposes the computation into a large number of fine-grained operations exposing a higher degree of parallelism. The SuperMatrix run-time system allows an out-of-order scheduling of operations that is transparent to the pr...

متن کامل

Towards a Parallel Tile LDL Factorization for Multicore Architectures

The increasing number of cores in modern architectures requires the development of new algorithms as a means to achieving concurrency and hence scalability. This paper presents an algorithm to compute the LDLT factorization of symmetric indefinite matrices without taking pivoting into consideration. The algorithm, based on the factorizations presented by Buttari et al. [11], represents operatio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006